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^ ■ Abstract 

We consider the problem of learning a coefficient vector xq G R n from noisy linear observation 
\Q , y = Axq + w G K n . In many contexts (ranging from model selection to image processing) it is 

desirable to construct a sparse estimator x. In this case, a popular approach consists in solving 
an ^-penalized least squares problem known as the LASSO or Basis Pursuit DcNoising (BPDN). 
' For sequences of matrices A of increasing dimensions, with independent gaussian entries, we 

Xj} , prove that the normalized risk of the LASSO converges to a limit, and we obtain an explicit 

expression for this limit. Our result is the first rigorous derivation of an explicit formula for the 
asymptotic mean square error of the LASSO for random instances. The proof technique is based 
on the analysis of AMP, a recently developed efficient algorithm, that is inspired from graphical 
models ideas. 
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1 Introduction 

Let xq 6 M. n be an unknown vector, and assume that a vector y G M. n of noisy linear measurements 
of xq is available. The problem of reconstructing xq from such measurements arises in a number of 

00 
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o 



disciplines, ranging from statistical learning to signal processing. In many contexts the measurements 
■ are modeled by 

y = Ax + w, (1.1) 



where A G IR nxAr is a known measurement matrix, and w is a noise vector. 

The LASSO or Basis Pursuit Denoising (BPDN) is a method for reconstructing the unknown 
vector xq given y, A, and is particularly useful when one seeks sparse solutions. For given A, y, one 
considers the cost functions Cj^ y : — > R defined by 

C A , y (x) = - \\y - Ax\\ 2 + \\\x\\i , (1.2) 

with A > 0. The original signal is estimated by 

x(X; A, y) = argmin,, C A , y (x) . (1.3) 

In what follows we shall often omit the arguments A, y (and occasionally A) from the above notations. 
We will also use x(X;N) to emphasize the A r -dependence. Further \\v\\ p = (^I^i^f) 1 ^ denotes the 
£ p -norm of a vector v G W (the subscript p will often be omitted if p = 2). 
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A large and rapidly growing literature is devoted to (i) Developing fast algorithms for solving the 
optimization problem (|1.3j) : (u) Characterizing the performances and optimality of the estimator x. 
We refer to Section 11.31 for an unavoidably incomplete overview. 

Despite such substantial effort, and many remarkable achievements, our understanding of fjl .3f) is 
not even comparable to the one we have of more classical topics in statistics and estimation theory. 
For instance, the best bound on the mean square error (MSE) of the estimator (jl.3p . i.e. on the 
quantity A" 1 ^ — xo|| 2 , was proved by Candes, Romberg and Tao [CRT06] (who in fact did not 
consider the LASSO but a related optimization problem). Their result estimates the mean square 
error only up to an unknown numerical multiplicative factor. Work by Candes and Tao |CT07| on 
the analogous Dantzig selector, upper bounds the mean square error up to a factor Clog A, under 
somewhat different assumptions. 

The objective of this paper is to complement this type of 'rough but robust' bounds by proving 
asymptotically exact expressions for the mean square error. Our asymptotic result holds almost surely 
for sequences of random matrices A with fixed aspect ratio and independent gaussian entries. While 
this setting is admittedly specific, the careful study of such matrix ensembles has a long tradition 
both in statistics and communications theory and has spurred many insights | Joh06[ l"Tel99| . 

Although our rigorous results are asymptotic in the problem dimensions, numerical simulations 
have shown that they are accurate already on problems with a few hundreds of variables. Further, 
they seem to enjoy a remarkable universality property and to hold for a fairly broad family of matrices 
[DMM10] . Both these phenomena are analogous to ones in random matrix theory, where delicate 
asymptotic properties of gaussian ensembles were subsequently proved to hold for much broader 
classes of random matrices. Also, asymptotic statements in random matrix theory have been replaced 
over time by concrete probability bounds in finite dimensions. Of course the optimization problem 
(11.21) is not immediately related to spectral properties of the random matrix A. As a consequence, 
universality and non-asymptotic results in random matrix theory cannot be directly exported to the 
present problem. Nevertheless, we expect such developments to be foreseable. 

Our proofs are based on the analysis of an efficient iterative algorithm first proposed by jDMM09"] . 
and called AMP, for approximate message passing. The algorithm is inspired by belief-propagation 
on graphical models, although the resulting iteration is significantly simpler (and scales linearly 
in the number of nodes). Extensive simulations [DMM10J showed that, in a number of settings, 
AMP performances are statistically indistinguishable to the ones of LASSO, while its complexity is 
essentially as low as the one of the simplest greedy algorithms. 

The proof technique just described is new. Earlier literature analyzes the convex optimization 
problem (jl.3p -or similar problems- by a clever construction of an approximate optimum, or of a 
dual witness. Such constructions are largely explicit. Here instead we prove an asymptotically exact 
characterization of a rather non-trivial iterative algorithm. The algorithm is then proved to converge 
to the exact optimum. 

1.1 Definitions 



In order to define the AMP algorithm, we denote by rj : M. x M + — > M. the soft thresholding function 

( x - 9 if x > 9, 




if -9 < x < 9 
x + 9 otherwise. 



(1.4) 
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The algorithm constructs a sequence of estimates x G R , and residuals z G R n , according to the 
iteration 

x t+1 =r 1 {A*z t + x t -9 t ), (1.5) 

z* = y - Ax 1 + iz'" 1 (j/(A*z t_1 + a*" 1 ; t _i)) , 
o 

initialized with x° = 0. Here ^4* denotes the transpose of matrix A, and r/'( • ; • ) is the derivative 
of the soft thresholding function with respect to its first argument. Given a scalar function / and 
a vector u G R m , we let f(u) denote the vector (f(ui), . . . ,f(u m )) G R m obtained by applying / 
componentwise. Finally (u) = m~ l Y^ILi u i * s ^ ne avera ge of the vector u G R m . 

As already mentioned, we will consider sequences of instances of increasing sizes, along which the 
LASSO behavior has a non-trivial limit. 

Definition 1. The sequence of instances {xq(N), w(N), A(N)}n£N indexed by N is said to be a 
converging sequence if xq(N) G M. n , w(N) G R n , A(N) G M. nxN with n = n(N) is such that 
n/N — > 5 G (0,oo), and in addition the following conditions hold: 

(a) The empirical distribution of the entries of xq(N) converges weakly to a probability measure 
px on M with bounded second moment. Further N^ 1 YliLi x o,i(N) 2 — > E Px - q {Xq}. 

(6) The empirical distribution of the entries ofw(N) converges weakly to a probability measure pw 
on R with bounded second moment. Further n" 1 Y^i=i w i(N) 2 —> ^pwiW 2 }- 

(c) If {ei}\<i<N , Gi G M. N denotes the standard basis, thenm&x ie ^ \\A(N)ei\\2, minj g [7v] ||-^(-^) e i||2 — ^ 
1, as N ^ oo where [N] = {1,2,... ,N}. 

Let us stress that our proof only applies to a subclass of converging sequences (namely for gaussian 
measurement matrices A(N)). The notion of converging sequences is however important since it 
defines a class of problem instances to which the ideas developed below might be generalizable. 

For a converging sequence of instances, and an arbitrary sequence of thresholds {9t}t>o (inde- 
pendent of N), the asymptotic behavior of the recursion (|1.5p can be characterized as follows. 

Define the sequence {r t 2 }t>o by setting Tq = a 2 + E{Xq}/5 (for X ~ px and a 2 = E{W 2 }, 
W ~ pw) and letting, for all t > 0: 

r 2 +1 = F(rlO t ), (1.6) 
F(t 2 ,6) = a 2 + ^E{[r ] (X +rZ;e)-X ] 2 }, (1.7) 

where Z ~ N(0, 1) is independent of Xq. Notice that the function F depends implicitly on the law 
Px - 

We say a function tp : M 2 — > R is pseudo-Lipschitz if there exist a constant L > such that for 
all x,y G R 2 : \tp(x) - ip(y)\ < L(l + \\x\\ 2 + \\yh)\\x - y\\ 2 . (This is a special case of the definition 
used in [BMlOj where such a function is called pseudo-Lipschitz of order 2.) 

Our next proposition that was conjectured in |DMM09] and proved in [BMlOj. It shows that 
the behavior of AMP can be tracked by the above one dimensional recursion. We often refer to this 
prediction by state evolution. 
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Theorem 1.1 ( jBMlOj ). Let {xo(N), w(N), A(N)}^^ be a converging sequence of instances with 
the entries of A(N) iid normal with mean and variance 1/n and let tp : M x M — > M 6e a pseudo- 
Lipschitz function. Then, almost surely 

1 N 

^jvZM 1 ** 1 ' 1 ".*) =®{^(v(Xo + T t Z;e t ),X )}, (1.8) 

i=l 

where Z ~ N(0, 1) is independent of Xq ~ Px - 

In order to establish the connection with the LASSO, a specific policy has to be chosen for the 
thresholds {Ot}t>o- Throughout this paper we will take 6t = art with a is fixed. In other words, the 
sequence {r 4 } t >o is given by the recursion 

r 2 +1 = F(r 2 ,ar t ). (1.9) 

This choice enjoys several convenient properties |DMM09"] . 



1.2 Main result 

Before stating our results, we have to describe a calibration mapping between a and A that was 
introduced in |DMM10] . 

Let us start by stating some convenient properties of the state evolution recursion. 

Proposition 1.2 ( [D MM 09] ) . Let a m i n = a m i n (5) be the unique non-negative solution of the equation 

(1 + a 2 )$(-a) -acp(a) = -, (1.10) 

with (f)(z) = e~ z ' 1 1 2 1 '\p2/n the standard gaussian density and <&(z) = f* 4>(x) dx. 

For any a 2 > 0, a > a m ; n (5) ; the fixed point equation r 2 = F(r 2 ,ar) admits a unique solution. 
Denoting by r* = r*(a) this solution, we have lim^oo Tt = r*(a). Further the convergence takes 
place for any initial condition and is monotone. Finally |^-(r 2 ,ar)| < 1 at t = t*. 

For greater convenience of the reader, a proof of this statement is provided in Appendix lA.il 
We then define the function a i— > A(a) on (a m in(<5), oo), by 



A (a) = ar» 



(1.11) 



This function defines a correspondence (calibration) between the sequence of thresholds {0t}t>o and 
the regularization parameter A. It should be intuitively clear that larger A corresponds to larger 
thresholds and hence larger a since both cases yield smaller estimates of xo- 

In the following we will need to invert this function. We thus define a : (0, oo) — > (a m ; n ,oo) in 
such a way that 



a 



(A) G { a G (Qmin, oo) : A(a) = A} 



The next result implies that the set on the right-hand side is non-empty and therefore the function 
A I—?- a (A) is well defined. 
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Figure 1: Mapping r 2 H- F(r 2 ,ar) for a = 2, 5 = 0.64, a 2 = 0.2, Px ({+1}) = Px ({-*}) = °- 064 
andp Xo ({0}) = 0.872. 



Proposition 1.3 ([DMM10 ]). XTie function a i— >■ A(a) is continuous on the interval (a m i n ,oo) u>ii/i 
A(a m in+) = -co and lim ct _ s . 0O A(a) = oo. 

Therefore the function A h-» a(A) satisfying Eq. U.12\) exists. 

A proof of this statement is provided in Section [A.21 We will denote by A = a((0, oo)) the image 
of the function a. Notice that the definition of a is a priori not unique. We will see that uniqueness 
follows from our main theorem. 

Examples of the mappings r 2 i— > F(t 2 , «t), a i— > r*(ct) and a i— >■ A(a) are presented in Figures [TJ 
[21 and [3] respectively. 

We can now state our main result. 

Theorem 1.4. Let {xo(N), w(N), A(N)}n^ be a converging sequence of instances with the entries 
of A(N) iid normal with mean and variance 1/n. Denote by x(X;N) the LASSO estimator for 
instance (x (N),w(N),A(N)), with a 2 ,X > 0, F{X / 0} and let ip : R x R ->• R be a pseudo- 
Lipschitz function. Then, almost surely 

1 N 

ifelvXM^o.O = E {V'(^(^o + nZ;^),Ao)}, (1.12) 
°° i=l 

where Z ~ N(0, 1) is independent of Xq ~ px , = T*(a(A)) aric ^ = ce(A)r*(a(A)). 
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As a corollary, the function A *— > a(X) is indeed uniquely defined. 

Corollary 1.5. For any A, ct 2 > there exists a unique a > a m \ n such that \{a) = A (with the 
function a — > X(a) defined as in Eq. 111.11]) . 

Hence the function A i— > a(A) is continuous non- decreasing with a((0, oo)) = A = (ao,oo). 

The proof of this corollary (which uses Theorem II. 4p is provided in Appendix IA.31 

The assumption of a converging problem-sequence is important for the result to hold, while the 

hypothesis of gaussian measurement matrices A(N) is necessary for the proof technique to be correct. 

On the other hand, the restrictions A, a 2 > 0, and f{Xo 7^ 0} > (whence r* / using Eq. p. lip ) 

are made in order to avoid technical complications due to degenerate cases. Such cases can be 

resolved by continuity arguments. 

The proof of Theorem 11.41 is given in Section [3j 

1.3 Related work 

The LASSO was introduced in |Tib96l ICD95] . Several papers provide performance guarantees for 
the LASSO or similar convex optimization methods [CRT061 ICT07] . by proving upper bounds on 
the resulting mean square error. These works assume an appropriate 'isometry' condition to hold for 
A. While such condition hold with high probability for some random matrices, it is often difficult to 
verify them explicitly. Further, it is only applicable to very sparse vectors xq. These restrictions are 
intrinsic to the worst-case point of view developed in [CRT06, C T07| . 

Guarantees have been proved for correct support recovery in |ZY06| . under an appropriate 'in- 
coherence' assumption on A. While support recovery is an interesting conceptualization for some 
applications (e.g. model selection), the metric considered in the present paper (mean square error) 
provides complementary information and is quite standard in many different fields. 

Closer to the spirit of this paper [RFG09] derived expressions for the mean square error under 
the same model considered here. Similar results were presented recently in |KWT09j IGBS09"] . These 
papers argue that a sharp asymptotic characterization of the LASSO risk can provide valuable 
guidance in practical applications. For instance, it can be used to evaluate competing optimization 
methods on large scale applications, or to tune the regularization parameter A. 

Unfortunately, these results were non-rigorous and were obtained through the famously powerful 
'replica method' from statistical physics [MM09J . 

Let us emphasize that the present paper offers two advantages over these recent developments: (i) 
It is completely rigorous, thus putting on a firmer basis this line of research; (ii) It is algorithmic in 
that the LASSO mean square error is shown to be equivalent to the one achieved by a low-complexity 
message passing algorithm. 

2 Numerical illustrations 

Theorem 11.41 assumes that the entries of matrix A are iid gaussians. We expect however the mean 
square error prediction to be robust and hold for much larger family of matrices. Rigorous evidence 
in this direction is presented in |KM10] where the normalized cost C(x)/N is shown to have a limit 
as N — > oo which is universal with respect to random matrices A with iid entries. (More precisely, it 
is universal provided E{^4jj} = 0, E{A 2 } = 1/n and E{.4f •} < C/n 3 for some uniform constant C '.) 
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Further, our result is asymptotic, while and one might wonder how accurate it is for instances of 
moderate dimensions. 

Numerical simulations were carried out in |DMM10| IBBMlOj and suggest that the result is robust 
and relevant already for N of the order of a few hundreds. As an illustration, we present in Figs. [4] 
and [5] the outcome of such simulations for two types of random matrices. Simulations with real data 
can be found in [BBM10]. We generated the signal vector randomly with entries in {+1,0, —1} and 
^(xo,i = +1) = POco.i = —1) = 0.064. The noise vector w was generated by using i.i.d. N (0,0.2) 
entries. 

We obtained the optimum estimator x using CVX, a package for specifying and solving convex 
programs jGBlOj and 0WLQN, a package for solving large-scale versions of LASSO |A J07j . We used 
several values of A between and 2 and N equal to 200, 500, 1000, and 2000. The aspect ratio 
of matrices was fixed in all cases to <5 = 0.64. For each case, the point (A, MSE) was plotted and 
the results are shown in the figures. Continuous lines corresponds to the asymptotic prediction by 
Theorem 11.41 for ip(a,b) = (a — b) 2 , namely 

lim lp_ Xo ||2 =E |[ r?(Xo + nZ .^)_ Xo ] 2 } =6{T 2_ a 2y 

7V->oo TV 

The agreement is remarkably good already for N, n of the order of a few hundreds, and deviations 
are consistent with statistical fluctuations. 

The two figures correspond to different entries distributions: (i) Random gaussian matrices with 
aspect ratio 5 and iid N(0, 1/n) entries (as in Theorem II. 4|) ; (ii) Random ±1 matrices with aspect 
ratio 5. Each entry is independently equal to +l/ v / n or —1/y/n with equal probability. 

Notice that the asymptotic prediction has a minimum as a function of A. The location of this 
minimum can be used to select the regularization parameter. 

3 A structural property and proof of the main theorem 

We will prove the following theorem which implies our main result, Theorem 11.41 

Theorem 3.1. Assume the hypotheses of Theorem \l-4\ Let x(X;N) the LASSO estimator for in- 
stance (xq(N),w(N), A(N)), and denote by {x t (N)}t>o the sequence of estimates produced by AMP. 
Then 

lim lim h\ x \N)-x(X;N)\\ 2 2 =0, (3.1) 
t— too N^oo 1\ 

almost surely. 

The rest of the paper is devoted to the proof of this theorem. Section 13.21 proves a structural 
property that is the key tool in this proof. Section T3.3I uses this property together with a few lemmas 
to prove Theorem 13.11 

The proof of Theorem 11.41 follows immediately. 
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Figure 4: Mean square error (MSE) as a function of the regularization parameter A compared to the 
asymptotic prediction for 5 = 0.64 and a 2 = 0.2. Here the measurement matrix A has iid N(0, 1/n) 
entries. Each point in this plot is generated by finding the LASSO predictor x using a measurement 
vector y = Axq + w for an independent signal vector xq, an independent noise vector w, and an 
independent matrix A. 
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Figure 5: As in Fig. [J but the measurement matrix A has iid entries that are equal to ±l/y/n with 
equal probabilities. 
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Proof of Theorem \1.4\ For any t > 0, we have, by the pseudo-Lipschitz property of ip, 

N H N 



i N i * 

i=l i=l 



N 

i=l 



< | " 5?|| 



A' 



, ^(l + 2|x ,i| + |x* +1 | + |^|)^ 
\ i=i 



l|^-xl| a LW + 41|^||2 + 4 | 



AT V N N N 



where the second inequality follows by Cauchy-Schwarz. Next we take the limit A" — > oo followed 
by t oo. The first term vanishes by Theorem 13.11 For the second term, note that H^oHi/Af 
remains bounded since (xq,w,A) is a converging sequence. The two terms ||^ i+1 ||2/^ and HxlH/A 7 ' 
also remain bounded in this limit because of state evolution (as proved in Lemma 13.31 below) . 
We then obtain 



J im at y^^^'^.O = lim J im T7 i>2i>{ x V~ 1 i x o,i) = ^{i>(v( x o + t*Z;6*),X q )\ , 

7V->oo A ^-^ t-»oo TV-s-oo A L — ' I J 

i=l i=l 

where we used Theorem 1 1 . 1 1 and Proposition 11.21 □ 



3.1 Some notations 

Before continuing, we introduce some useful notations. For any non-empty subset S of [m] and any 
k x m matrix M we refer by Ms to the k by \S\ sub-matrix of M that contains only the columns of 
M corresponding to 5. The same notation is used for vectors v £ M. m : vs is the vector (vi : i £ S). 

The transpose of matrix M is denoted by M* . 

We will often use the following scalar prduct for u, v € W 71 : 

- m 

(u,v) = — }U{Vi . (3.2) 

i=l 

Finally, the subgradient of a convex function / : M m — > M at point x £ R m is denoted by df{x). 
In particular, remember that the subgradient of the t\ norm, x \— > \\x\\i is given by 

= {d£ M m such that \v{\ < lVz and X{ ^ => = sign(xj)} . (3.3) 



3.2 A structural property of the LASSO cost function 

One main challenge in the proof of Theorem 11.41 lies in the fact that the function x i— > C^j^a;) is 
not -in general- strictly convex. Hence there can be, in principle, vectors x of cost very close to the 
optimum and nevertheless far from the optimum. 

The following Lemma provides conditions under which this does not happen. 

Lemma 3.2. There exists a function £(e, c±, . . . , C5) such that the following happens. If x, r S W N 
satisfy the following conditions 
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1. \\r\\ 2 < ciVN; 

2. C(x + r) < C(x); 

3. There exists sg(C,x) G dC(x) with ||sg(C, x)\\2 < VN e; 

4. Letv = (l/X)[A*(y-Ax) + sg{C,x)]ed\\x\\ 1 , andS(c 2 ) = {i€[N}: H>l-c 2 }. Then, for 
any S' C [N], \S'\ < c 3 N, we have cr min (^ S ( C2 ) Ug /) > c 4 ; 

5. The maximum and minimum non-zero singular value of A satisfy 1 < a m i n (A) 2 < <7 max (^4) 2 < 

Then \\r\\2 < VN c\, . . . , C5). Further for any c±, . . . , C5 > 0, £(e, c\, . . . , C5) — > as e — >• 0. 
Further, ifker(A) = {0}, i/ie same conclusion holds under assumptions [7J OJ [21 [3 

Proof. Throughout the proof we denote £1, £2, ■ ■ ■ functions of the constants c±, . . . , C5 > and of e 
such that £j (e) — >• as e — > (we shall omit the dependence of & on e) . 
Let 5 = supp(x) C [JV]. We have 

W / C(x + r)-C(x) 
~ V N 

(J A / |ks+r s ||i - ||g s ||i \ A||r^||i + - Ax - Ar\\ 2 2 - %\\y - Axf 2 



N J N 

= A(fe^±Mpt2«i - < S ig„(. s ), r s) ) + A( Ml - <•», rj)) + A(„, r) - - Ax, Ar) + J±f 
M v /l|zs+rsl|i-||*sl|i ,„,_,_>_ i\ , ,.. „ >\ , ,„„,„ _i , l|Ar|l| 



A( l|IS + r ^-" ISl " - ( S ign(«), rs) ) + A(J!M1 - fe,-,)) + (sg(C.x), r > + 



2N ' 

where (a) follows from hypothesis ([2]), (c) from the fact that vs = sign(xg) since v G and (cZ) 

from the definition of (v). Using hypothesis ([T]) and ([3]), we get by Cauchy-Schwarz 

,f\\xs + r s \\i - \\x s \\i ,. ( v v \ . ./Il^lli , v \ . Pr||| 

A ( ^ (sign(x 5 ),r s )J + \^-Z—-( V3 ,r 3 )) + <cie. 

Since each of the three terms on the left-hand side is non-negative it follows that 

^-(^5,^) < 6(e), (3.4) 

\\Ar\\l < NCi(e). (3.5) 

Write r = r 1 - + r", with r" G ker(A) and r -1 - _L ker(^4). It follows from Eq. (|3.5p and hypothesis © 
that 

||r- x |||< Nestle). (3.6) 

In the case ker(A) = {0}, the proof is concluded. In the case ker(^4) 7^ {0}, we need to prove an 
analogous bound for A. From Eq. (|3.4p together with ||r^||i < >/iV||ri||2 < V^V || i"" 1 " 1 1 2 < iV 1/05^1 (e), 
we get 

Ar jl =0, (3.7) 
^-<^,4)<6( £ ), (3.8) 
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Notice that Sfa) C S. From Eq. (|3.8p and definition of Sfo) it follows that 



C2 



(3.9) 
(3.10) 



Let us first consider the case |5(c 2 )| > Nc-i/2. Then partition S{c 2 ) = uf =1 S £ , where (Nc 3 /2) < 
\Se\ < Ncs, and for each i G Sg, j G 5^+i, > |r||. Also define S+ = uj^ 2 <% C £(02). Since, for 
any i G |r|| < ||r| £ _ i ||i/|SV_i|, we have 



K 



K 



£=2 



< 



Nc- 



< 



K 

E 

£=2 



\St-i\ 

K 



KJ\t< 



Nc 3 " S(c 2 ) 



< 



jVc 3 

4g 2 (e) 2 

c|c 3 



E 



Af = iV&(e) 



To conclude the proof, it is sufficient to prove an analogous bound for ||r| ||| with S+ = [N] \ 
= S(c2) U Si. Since |5i| < Nc 3 , we have by hypothesis (jl]) that <J m m(As + ) > C4. Since 



= Ar" = As.rl + r-i , we have 



c 4 ||r 5 || 2 < ||A s+ r 



S+H2 



lie r. 



< c 5 ||r 



< c 5 N&(e) 



This finishes the proof when |S'(c2)| > Nc^/2. Note that if this assumption does not hold then we 
have S+ = and 5+ = [N]. Hence, the result follows as a special case of above. □ 



3.3 Proof of Theorem I37T1 

The proof is based on a series of Lemmas that are used to check the assumptions of Lemma 13.21 

The first one is an upper bound on the ^2-norm of AMP estimates, and of the LASSO estimate. 
Its proof is deferred to Section 15.11 

Lemma 3.3. Under the conditions of Theorem \1.4\ assume A > and a = a(X). Denote by x(X;N) 
the LASSO estimator and by {x t (N)} the sequence o/AMP estimates. Then there is a constant B 
such that for all t > 0, almost surely 

lim lim {x* (N) , x* (N)) < B, (3.11) 
lim (x(X;N),x(X;N)) < B. (3.12) 

iV->oo 

The second Lemma implies that the estimates of AMP are approximate minima, in the sense 
that the cost function C admits a small subgradient at x , when t is large. The proof is deferred to 
Section 15.21 
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Lemma 3.4. Under the conditions of Theorem \1.4\ for all t there exists a subgradient sg(C,rr') of C 
at point x t such that almost surely, 

lim lim l|| sg (Cy)|| 2 = 0. (3.13) 

t-»oo jV— >oo iv 

The next lemma implies that submatrices of A constructed using the first t iterations of the AMP 
algorithm are non-singular (more precisely, have singular values bounded away from 0). The proof 
can be found in Section 15.31 

Lemma 3.5. Let S C [N] be measurable on the a -algebra &t generated by {z°, . . . , z*" 1 } and {x° + 
A*z°, . . . , x t_1 +A*z t ~ 1 } and assume \S\ < N(5 — c) for some c > 0. Then there exists a\ = a\{c) > 
(independent oft) and 02 = 02(0, t) > (depending on t and c) such that 

min {a mhl (AsuS') ■ S' C [N] , \S'\ < ai N} > a 2 , (3.14) 

with probability converging to 1 as N — )■ 00. 

We will apply this lemma to a specific choice of the set S. Namely, defining 

v t = 7 r^—(x t - 1 +A*z t - 1 -x t ), (3.15) 

we will then consider the set 

St^^iieiN] : K*|>l- 7 }, (3.16) 

for 7 G (0,1). Our last lemma shows that this sequence of sets 6*4(7) 'converges' in the following 
sense. The proof can be found in Section [5. 4i 

Lemma 3.6. Fix 7 £ (0, 1) and let the sequence {5j(7)}t>o be defined as in Eq. $3.16\) above. For 
any £ > there exists t* = t*(C,7) < 00 such that, for all ti > t\ > t* 

lim P{|5 i2 (7)\5 tl ( 7 )|>iVe} = 0. (3.17) 

The last two lemmas imply the following. 
Proposition 3.7. There exist constants 71 € (0, 1), 72, 73 > and t m j n < 00 such that, for any 

) : S' C [N] , \S'\ < 72 iV} > 73 (3.18) 
with probability converging to 1 as N — >■ 00. 

Proof. First notice that, for any fixed 7, the set <St( 7 ) is measurable on &t- Indeed by Eq. f)1.5|) &t 
contains {x°, . . . , x f } as well, and hence it contains v t which is a linear combination of + A*z t , 
x 1 '. Finally St (7) is obviously a measurable function of v t . 
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Using Lemma IF. 3f b) the empirical distribution of (xq — A*z t ~ 1 — x t ~ 1 ,xo) converges weakly to 
(rt-iZ, Xq) for Z ~ N(0, 1) independent of Xq ~ px - (Following the notation of |BM10| . we let 
h l = xq — A*z t ~ 1 — X* .) Therefore, for any constant 7 we have almost surely 



lim = lim 



N— >oo 



1 N 

V^2 I (^\x t r 1 + \A*zt-i],-xt\>l-y\ ( 3 ' 19 ) 



N N^ooN^ {^-r\4^ 1 +l A * zt - 1 ]i-4\> 1 -'r} 
1=1 1 1 1 ' 



j™, AT ^^s^h-fcM^O-MA-O^l-T} (3 ' 20) 

IXo + rt-iZ-^Xo + rt-iZ.et-i)! > I-7) • (3.21) 



n-i 



The last equality follows from the weak convergence of the empirical distribution of {(hi, £o,i)}«e[iV] 
(from Lemma lF,3f b). which takes the same form as Theorem 13, ip . together with the absolute conti- 
nuity of the distribution of \Xq + Tt-\Z — t](Xq + T t -\Z, Qt-i)\- 
Now, combining 



0t_i When \X +T t ^Z\ >9 t -i, 

\Xq + Tt-\Z\ Otherwise , 



X + rt-xZ - r](X + rt-iZ, Ot-i) 
and Eq. (|3.2ip we obtain almost surely 

E{r ? / (X + r t _iZ,^_ 1 )}+p{(l- 7 ) < _L|X + n_iZ| < l}. (3.22) 



7V->oo N 



It is easy to see that the second term P{1 — 7 < (l/8 t -i)\X + Tt-\Z\ < 1} converges to as 7 — > 0. 
On the other hand, using Eq. (jl.lip and the fact that A(a) > the first term will be strictly smaller 
than 5 for large enough t. Hence, we can choose constants 71 6 (0, 1) and c > such that 

lim P{|St( 7 i)| < N(6- c)} = 1 . (3.23) 

for all f larger than some i m i n ,i(c). 

For any f > i m in,i(c) we can apply Lemma [331 for some ai(c), 02(0, i) > 0. Fix c > and let 
a i = oi(c) be fixed as well. Let t m ; n = max(i milljl , t*(ai/2, 71J) (with £*(•) defined as per Lemma 
I3.6p . Take 02 = 02 (c, t m in)- Obviously i 1— )■ 02(0, i) is non-increasing. Then we have, by Lemma 13.51 



mm{a min (A Strninhl)uS ,) : 5' C [iV] , \S'\ < ai N} > a 2 , (3.24) 

and by Lemma 13.61 

l^(7i) \S tmin ( 7 i)l <N ai /2, (3.25) 

where both events hold with probability converging to 1 as N — > 00. The claim follows with 72 = 
ai(c)/2 and 73 = a 2 (c,t min ). □ 

We are now in position to prove Theorem 13.11 
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Proof of Theorem \3.1\ We apply Lemma [3.2l to x = x , the AMP estimate and r = x — x l the distance 
from the LASSO optimum. The thesis follows by checking conditions 1-5. Namely we need to show 
that there exists constants c±, . . . , C5 > and, for each e > some t = t(e) such that 1-5 hold with 
probability going to 1 as N — > 00. 

Condition 1 holds by Lemma I3.3L 

Condition 2 is immediate since x + r = x minimizes C{ ■ ). 

Condition 3 follows from Lemma 13.41 with e arbitrarily small for t large enough. 
Condition 4- Notice that this condition only needs to be verified for 5 < 1. 

Take v = v t as defined in Eq. (|3.15p . Using the definition (jl.5p . it is easy to check that \vj\ < 1 
if x\ = and v\ = s\gn(x\) otherwise. In other words v £ $||a;||i as required. Further by inspection 
of the proof of Lemma \3A\ it follows that v t = (l/X)[A*(y — Ax 1 ) + sg(C,a;')], with sg(C,a;') the 
subgradient bounded in that lemma (cf. Eq. (15. 3ft ). The condition then holds by Proposition 13.71 

Condition 5 follows from standard limit theorems on the singular values of Wishart matrices (cf. 
Theorem El). □ 

4 State evolution estimates 

This section contains a reminder of the state-evolution method developed in |BM10| . We also state 
some extensions of those results that will be proved in the appendices. 

4.1 State evolution 

AMP, cf. Eq. (II. 5|) is a special case of the general iterative procedure given by Eq. (3.1) of |BM10| . 
This takes the general form 

h t+1 = A^m'-itq 1 , =g t (b t ,w), 

b l = Aqt-Xtm*- 1 , q t = Mh t ,x ), (4.1) 

where = (g'Q/^w)), Aj = ^(//(/i*, x )) (both derivatives are with respect to the first argument). 
This reduction can be seen by defining 

h t+1 = x - (A*z t + x l ) , (4.2) 

q l = x l - x , (4.3) 

6* = w-z t , (4.4) 

m* = -z* , (4.5) 

where 

ft(s,x ) = i] t -i(x - s) - x , gt(s,w) = s - w , (4.6) 
and the initial condition is q° = —xq. 
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Regarding h f , b l as column vectors, the equations for 6°, . . . , b l 1 and h 1 , . . . ,h l can be written in 
matrix form as: 

[h 1 + £ q \h 2 + tiq 1 ] ■■■\h t + =A*[m°\... Im*" 1 ] , (4.7) 

v v ' V v ' 

Xt M t 
[b^b 1 + Aim°| • • • \b l ~ l + A f _im 4 - 2 ] = A [q°\ . . . \q l ~ 1 } . (4.8) 

S v ' V v ' 

Y t Qt 



or in short Y t = AQ t and X t = A*M t . 

Following jBMlOj . we define &t as the a-algebra generated by 6°, ... , m°, . . . , m <_1 , h\ . . . , h t , 
and q°, . . . , q t . The conditional distribution of the random matrix A given the u-algebra &t, is given 
by 

A\ &t =E t + Vt(A). (4.9) 

Here A = A is a random matrix independent of ©t, and E t = E(A|6 t ) is given by 

E t = Yt(Q* t Qty l Ql + M t (M* t M t y l X* t - MtiMfMt^MtYtiQiQt)-^ . (4.10) 
Further, P t is the orthogonal projector onto subspace V< = {^4| AQ t = 0, A*M t = 0}, defined by 

Here P^ t = I — Pu t •> Pn t = I ~ PQ t > an d -Pq 4 , -FWt are orthogonal projector onto column spaces of 
Qt and M t respectively. 

Before proceeding, it is convenient to introduce the notation 

u t = ^(r ] '(A*z t - 1 + x t - 1 ;e^ 1 )) 

to denote the coefficient of z l ~ l in Eq. (|1.5p . Using h f = xq — A*z t ~ 1 — and Lemma lF.3f b) 
(proved in |BM10j ) we get, almost surely, 

lim = = ]m[t/(X + T t -iZ;0t-i)] . (4.11) 

TV— >oo 

Notice that the function rj'(-;6t-i) is discontinuous and therefore Lemma IF. 3( b) does not ap- 
ply immediately. On the other hand, this implies that the empirical distribution of {(A*z t ~ l + 
,xo,i)}i<i<N converges weakly to the distribution of (Xq + Tt-\Z, Xq). The claim follows from 
the fact that Xq + Tt-\Z has a density, together with the standard properties of weak convergence. 

4.2 Some consequences and generalizations 

We begin with a simple calculation, that will be useful. 
Lemma 4.1. If {z t }t>o are the AMP residuals, then 

lim — 1 1 1 1 2 = T> 2 - (4.12) 

n->oo n 
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Proof. Using representation (|4.5[) and Lemma fF.3f b) (c) . we get 

lim - Wz'f ^ lim - Wm'f ^ lim - \\h t+1 f = r 2 . 

n— >oo n n— »oo fl TV-s-oo AT 



□ 



Next, we need to generalize state evolution to compute large system limits for functions of x l , 
x s , with t ^ s. To this purpose, we define the covariances {R Si t} Si j>o recursively by 

R s+M+ i = a 2 + - § e{ [ V (Xo + Z s ; 0„) - X ] [ V (X + Z t ; t ) - X }} , (4.13) 

with (Z s ,Zt) jointly gaussian, independent from Xq ~ px with zero mean and covariance given 
by E{Z 2 } = R s>s , E{Z?} = R t>u E{Z s Z t } = R Sjt . The boundary condition is fixed by letting 
R 00 = a 2 + E{X$}/5 and 

R ,t+i = a 2 + ^e{[ V (X + Z t ;d t ) - X ](-X )} , (4.14) 

with Zt ~ N(0, Rtt) independent of Xq. This determines by the above recursion Rt jS for all t > 
and for all s > 0. 

With these definition, we have the following generalization of Theorem II. 11 

Theorem 4.2. Let {xo(N), w(N), A(N)}]y & fq be a converging sequence of instances with the entries 
of A(N) iid normal with mean and variance 1/n and let tp : M 3 — >■ M be a pseudo-Lipschitz function. 
Then, for all s > and t > almost surely 



1 N 

Iw^ - J2H x i + ( A * z ^ < + ( A * z % x o, 4 ) = e{^(X + Z s , X + 2k, X ) } , (4.15) 



i=i 

where (Z s , Zt) jointly gaussian, independent from Xq ~ px with zero mean and covariance given by 
E{Z 2 } = R S}S> E{Z 2 } = R t , t , E{Z s Z t } = R s , t . 

Notice that the above implies in particular, for any pseudo-Lipschitz function tp : R 3 — > M, 

N 



1 N 

hrn^ - 1>{xt +1 , xl + \ x ,i) = E{i>( V (X + Z s ; e s ), V (X + Z t ; t ),X o ) } . (4.16) 
°° i=i 



Clearly this result reduces to Theorem 11.11 in the case s = t by noting that Rt : t = r 2 . The general 
proof can be found in Appendix iBl 

The following lemma implies that, asymptotically for large N, the AMP estimates converge. 

Lemma 4.3. Under the condition of Theorem \ l-4\ the estimates {x l }t>o and residuals {z t }t>o of 
AMP almost surely satisfy 

lim lim — Mac* - x* _1 || 2 = , lim lim — llz 1 - z^H 2 = . (4.17) 

t^-ooN^ooN t^ooN->ocN 

The proof is deferred to Appendix O 
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5 Proofs of auxiliary lemmas 
5.1 Proof of Lemma 13.31 

In order to bound the norm of x*, we use state evolution, Theorem II, 1\ for the function ?/>(a, b) = a 2 , 

lim lim (x t ,x t ) =' E{?7(Xo + nZ;^) 2 ) 

t->oo 7V->oo 

for Z ~ N(0, 1) and independent of Xq ~ px - The expectation on the right hand side is bounded 
and hence lim t ^ > . oo limi^_ > . O0 (x t , x l ) is bounded. 
For x, first note that 

Lc(x) < — C(0) = — llyll 2 



— — — II Axn + it'll 2 



2^11^0 

"w\\ 2 + a max (A) 2 \\x f 2 



< ^ ' " U " < Bi. (5.1) 

The last bound holds almost surely as N — > oo, using standard asymptotic estimate on the singular 
values of random matrices (cf. Theorem IF.2|) implying that 0" max (j4) has a bounded limit almost 
surely, together with the fact that (xo,w,A) is a converging sequence. 

Now, decompose x as x = x\\ + Sj_ where G ker(^4) and ir^ E ker(>l)^ (the orthogonal 
complement of ker(^4)). Since, x\\ belongs to the random subspace ker(^4) with dimension N — n = 
N(l — 5), Kashin theorem (cf. Theorem IF.1|) implies that there exists a positive constant c\ = c\(8) 
such that 

1 Il^ll2 1 m~ ||2 , 1 n~ i|2 

iv M =iv l|x " 11 + iv l|x±l1 

Hence, by using triangle inequality and Cauchy-Schwarz, we get 

^PH 2 <2 Cl (&) 2 + 2 Cl (^) 2 + lpj| 2 
/||£||iV gci + l 2 

By definition of cost function we have ||x||i < \~ 1 C(x). Further, limit theorems for the eigenvalues of 
Wishart matrices (cf. Theorem IF. 2\i imply that there exists a constant c = c(5) such that asymptot- 
ically almost surely ||Sj_|| 2 < c||^4xj_|| 2 . Therefore (denoting by Cj : i = 2,3,4 bounded constants), 
we have 

>lP £ 2 Cl (Ml) 2 + |||^ 

. f\\x\\i V . 2c 2 ,. _ ||2 2c 2 ,. ||2 
< zci H \\v — Ax i H \\v\\ 

C{x)\ 2 C{x) 2c 2 2 



~ C ^N +2c 2 ^ + w \\Ax + W 
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The claim follows by using the Eq. (|5.1j) to bound C(x)/N and using \\Axq + u>|| 2 < o"max(A) 2 ||xo|| 2 + 
\\w\\ 2 < 2iVBi to bound the last term. □ 

5.2 Proof of Lemma 13.41 

First note that equation x l = r/(A*z t ~ 1 + x l ~ l ; 6t-\) of AMP implies 

if x\ i- , 



+ 6 t -i sign(x*) = L4V -1 ]* + x\ \ 
[A***" 1 ] 



(5.2) 



+ x 



t-i 



< 



n-i, 



if x\ = . 



Therefore, the vector sg(C,x 4 ) = As* — A*(y — Ax 1 ) where 

sign(x-) 

eh^z^ + x'r 1 } 
is a valid subgradient of C aX x l . On the other hand, y — Ax 1 = 



if x\£ , 

otherwise, 

- WfZ* -1 . We finally get 



(5.3) 



sg(C,x 4 ) 



1 
1 



[A^ia*-^-!^*^*-^*- 1 )] 



[A0t_ia* - t _i(l - c^AV" 1 ] - A*(z* - z*" 1 ) 



[A0 t _ 



* AAV" 1 ! 



[A -g t _i(l-^)] 



+ 



It is straightforward to see from Eqs. ([52]) and ([53]) that (J) = A(x*~ 1 - x*). Hence, 

|A-0t_i(l-w t )| 1 



"sg(C,x*)|| < — A 



lx t_ x t-ln + ^^l llz t_ z t-l l 



'N" " Bt-WN" VN " 0t_i 

By Lemma [Ol and the fact that <r max (A) is almost surely bounded as TV — > oo (cf. Theorem IF. 2ft . 
we deduce that the two terms A||x* — x t ~ 1 \\/(9 t _i\/ r N) and ^^(A)^* — z t_1 \\ 2 /y/~N converge to 
when iV — > oo and then f — >■ oo. For the third term, using state evolution (see Lemma l4.ip . we 



obtain limjv^oo \\z t \\ /N < oo. Finally, using the calibration relation Eq. (jl.lip . we get 



lim lim 

t—>CG N— >oo 



X-6t-l{l-U)t) 



n-i 



A -0,(1- -E{r/(Xo + nZ;0*)}) 



which finishes the proof. 



□ 



5.3 Proof of Lemma 13.51 

The proof uses the representation (|4.9p . together with the expression (|4.10p for the conditional 
expectation. Apart from the matrices Y t , Q t , X t , M t introduced there, we will also use 



Bi 



h 1 



1? 



h 1 



In this section, since t is fixed, we will drop everywhere the subscript t from such matrices. 
We state below a somewhat more convenient description. 
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Lemma 5.1. For any v G R , we have 

Me = Y(Q*Qy 1 Q*P Q v + M(M*M)~ 1 X*P£v + P^AP^v . (5.4) 

Proof. It is clearly sufficient to prove that, for v = v y + v±, PqV\\ = fy, Pq Vj_ = Vj_, we have 

Ev\\ = Y(Q*Qy 1 Q*v ]l , Ev ± = M(M*M)~ 1 X*v ± . (5.5) 

The first identity is an easy consequence of the fact that X*Q = M*AQ = M*Y, while the second 
one follows immediately from Q*v± = 0,. □ 

The following fact (see Appendix |D] for a proof) will be used several times. 

Lemma 5.2. For any t there exists c > such that, for R G {Q*Q; M*M; X*X; Y*Y}, as N — >• oo 
almost surely, 

c < A min ( J R/iV) < A max ( J R/iV) < 1/c . (5.6) 

Given the above remarks, we will immediately see that Lemma 13.51 is implied by the following 
statement. 

Lemma 5.3. Let S C [N] be given such that \S\ < N(5 — 7), for some 7 > 0. Then there exists 
ai = 01(7) > (independent of t) and a 2 = 0^(7, i) > (depending on t and such that 



"{ min \\Ev + PijAP£v\\ < a 2 &t\ < e 

^ |i>||=l, suppMCS J 



-Na-\ 



, supp(i;)C5 

with probability (over<S t ) converging to 1 as t ->• 00. (With Ev = Y(Q*Q)- 1 Q*P Q v+M(M*M)- 1 X*P^ 

In the next section we will show that this lemma implies Lemma 13.51 We will then prove the 
lemma just stated. 

5.3.1 Lemma 15.31 implies Lemma [3751 

We need to show that, for S measurable on &t and IS"! < N(5 — c) there exist a\ = a\{c) > and 
0-2 = «2(c, t) > such that 



lim F< min min \\Av\\ < 82 f = 0. 

V^oo I \S'\<aiN ||i>||=l,supp(»CSUS' J 



N- 

Conditioning on &t and using the union bound, this probability can be estimated as 
e|p| min min \\Av\\ < a 2 <3 t \\ < 

I I \S'\<aiN ||u||=l,supp(u)CSuS" J J 



< e Nh(ai) E { max p | min \\Av\\ < a 2 &t}\ 

I \S'\<a!N I |M|=l,supp(»CSuS" J J 
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where h(p) = —plogp— (1—p) log(l— p) is the binary entropy function. The union bound calculation 
indeed proceeds as follows 

P{ min X s > <a 2 \6 t }< V F{X S > < a 2 \&t} 

\S'\<N ai 1 f-^ 1 

1 l_ \S'\<Nai 



^\Yl(Ti\ max p { x ^' < a2 l 6 ^ 

\ k J J \S'\<N ai 1 



< e Nh{ai) max P{X S , < a 2 |© t }, 

\S'\<Nai 1 

where Xgi = m^\\ v \\=i }Snp p( v )csuS' ll^ll- Now, fix a\ < c/2 in such a way that h(ai) < q 1 (c/2)/2 
(with a\ defined as per Lemma 15. 3p . Further choose a 2 = a2(c/2,t)/2. The above probability is 
then upper bounded by 

e N ai (c/2)/2 E f max p f min \\Av\\ <-a 2 (c/2,t) 6t\\ ■ 

I \S"\<N(S-c/2) I |[«||=l,supp(«)CSf" 2 J J 

Finally, applying Lemma 15.31 and using Lemma 15 . II to estimate Av, we get 

e ^i/2 E { max e -Aran^ 

|S"|<iV(<S-c/2) J 

This finishes the proof. □ 
5.3.2 Proof of Lemma 1531 

We begin with the following Pythagorean inequality. 

Lemma 5.4. Let S C [iV] 6e given such that \S\ < N(5 — 7), /or some 7 > 0. Recall that Ev = 
Y{Q*Q)~ l Q*P Q v + M(M*M)~ 1 X*P^v and consider the event 

I 2 ^ T II I? D 4 II 2 7 II 7nl l|2 



£1 = \ \\Ev + PfiAP$v\\ > -^\\Ev - P M AP£ v\\ + Vv s.t. \\v\\ = 1 and supp(t;) C S 



-Na 



Then there exists a = a(j) > such that ¥{£\\&t} > 1 — e 

Proof. We claim that the following inequality holds for all u G K , that satisfy ||u|| = 1 and supp(f) C 
S, with the probability claimed in the statement 



\(Ev-P M AP£v,AP£v)\ < Jl-^\\Ev-P M AP£v\\\\AP£v\\. (5.7) 

Here the notation (it, v) refers to the usual scalar product u*v of vectors u and v of the same 
dimension. Assuming that the claim holds, we have indeed 

\\Ev + P^AP^vf > \\Ev - P M AP^v\\ 2 + PPqvH 2 - 2\{Ev - PmAPqv , AP^v)\ 



;|| 2 + \\P^AP^v\\ 2 - 2 Jl - ^ \\Ev - P M AP^v\\ \\AP%v\ 



«/i - ^){||^ - ^mAP^|| 2 + \\apqv\\ 2 ] 
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which implies the thesis. 

In order to prove the claim (15. 7j) . we notice that for any v, the unit vector APq v/\\ APq v\\ belongs 
to the random linear space im(APQ-Ps). Here P$ is the orthogonal projector onto the subspace of 
vectors supported on S. Further im^APqPs) is a uniformly random subspace of dimension at most 
N(5 — 7). Also, the normalized vector (Ev — Pm APqv) /\\Ev — PmAPqv\\ belongs to the linear 
space of dimension at most 2t spanned the columns of M and of B. The claim follows then from a 
standard concentration-of-measure argument. In particular applying Proposition IE. II for 



m = n, mX = N(5 — 7), d = 2t and e = y 1 — ^ — \/l — — 

yields 

( Ev-P M AP£ V AP^ \ r_ ± 



\\\Ev - P M AP£v\\ \\AP£v\\J ' V 25 

(Notice that in Proposition IE. II is stated for the equivalent case of a random sub-space of fixed 
dimension d, and a subspace of dimension scaling linearly with the ambient one.) □ 

Next we estimate the term ||APq?;|| 2 in the above lower bound. 

Lemma 5.5. Let S C [iV] be given such that \S\ < N(5 — 7), for some 7 > 0. Than there exists 
constant c\ = 01(7), c% = 02(7) such that the event 

£■2 = 1 1| APq?; || > ci(7)||Pq u|| Vf such that supp(-y) C , 

holds with probability Pj^lSj} > 1 — e~ Nc2 . 

Proof. Let V be the linear space V = im(PQ P5). Of course the dimension of V is at most N(5 — 7). 
Then we have (for all vectors with supp(w) C S) 

\\APqv\\ > a min (A\ v ) \\P$v\\ , (5.8) 

where A\y is the restriction of A to the subspace V. By invariance of the distribution of A under 
rotation, o" m i n (A|y) is distributed as the minimum singular value of a gaussian matrix of dimensions 
N5 x dim(y). The latter is almost surely bounded away from as TV — > 00, since dim(V) < N(5 — , y) 
(see for instance Theorem IF.2j) . Large deviation estimates [LPRTJ05] imply that the probability 
that the minimum singular value is smaller than a constant 01(7) is exponentially small. □ 

Finally a simple bound to control the norm of Ev. 

Lemma 5.6. There exists a constant c = c(t) > such that, defining the event, 

£ 3 = {\\EP Q v\\ > c(t)\\P Q v\\ , \\EP^v\\ < cCi) -1 !!^!!, for all v£R N }, (5.9) 

we have ¥(£3) — > 1 as N — )■ 00. 

Proof. Without loss of generality take v = Qa for a G M*. By Lemma 15.11 we have || EPq v\\ 2 = 
\\Yaf > A min (y*y)||a|| 2 . Analogously \\Pqv\\ 2 = \\Qa\\ 2 < A max (Q*Q)||a|| 2 . The bound \\EP Q v\\ > 
c(t)\\PQv\\ follows then from Lemma 15.21 

The bound \\EP^ v\\ < c(t) -1 \\Pqv\\ is proved analogously. □ 
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We can now prove Lemma 15.31 as promised. 

Proof of Lemma 15.31 By Lemma 15.61 we can assume that event £3 holds, for some function c = c(t) 
(without loss of generality c < 1/2). We will let £ be the event 

£={ min \\Ev + P^AP£v\\<a 2 (t)\. (5.10) 

for a 2 {t) > small enough. 

Let us assume first that ||Pqw|| < c 2 /10, whence 

\\Ev - PmAP^W > \\EP Q v\\ - \\EP£v\\ - \\P M APjfrv\\ 

> c\\P Q v\\ - (c- 1 + \\A\\ 2 )\\p£v\\ 

> ° ° WAW ° 2 — 2c WAW ° 2 
-2~10~" " 2 10 _ "5"~" " 2 10' 



where the last inequality uses \\Pqv \\ = Jl - \\P^v\\ 2 > 1/2. Therefore, using Lemma 15.41 we get 

2c „ 7l , c 2 US 



-Na 



and the thesis follows from large deviation bounds on the norm ||^4||2 [LedOlj by first taking c small 

c Fy 
5 V 4<5 • 



enough, and then choosing ai2(t) < - ' — 



Next we assume H-Pg^H > c 2 /10. Due to Lemma 15.41 and 15.51 we can assume that events £\ and 
£2 hold. Therefore 

\Ev + PifAP%v\\ > (tiY /2 \\AP^v\\ > f^^d^ll^H, 



which proves our thesis. □ 
5.4 Proof of Lemma 13.61 

The key step consists in establishing the following result, which will be instrumental in the proof of 
Lemma 14.31 as well (and whose proof is deferred to Appendix IC. ID . 



Lemma 5.7. Assume a > a m i n (5) and let {R s ,t} be defined by the recursion with initial 

condition Then there exists constants E>i ; ri > such that for all t > 

|R^-r 2 | < B ie ~ ri ', (5.11) 
|Rt, m -r 2 | < B ie - ri *. (5.12) 

It is also useful to prove the following fact. 

Lemma 5.8. For any a > and T > 0, the T x T matrix Rt+i = {Rs,t}o<s,t<T is strictly positive 
definite. 
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Proof. In proof of Theorem 14.21 we show that 

R st = lim (h s+ \h t+1 ) = lim (m s ,m*), 

almost surely. Hence, Rt+i = <5hm7v->oo(-^r+i-^r+i/-^)- Thus the result follows from Lemma 
E21 □ 



It is then relatively easy to deduce the following. 

Lemma 5.9. Assume a > a m i n (5) and let {R s ,t} be defined by the recursion fl^. TM with initial 
condition {J^.IJ^ . Then there exists constants B2, x<i > such that for all ti,i2 > t > 

|R ilji2 -r 2 | < B 2 e- r2 '. (5.13) 

Proof By triangular inequality and Eq. (|5,lip . we have 

|R*i,t 2 " rl\ < i|Rt 1)tl - 2R tl>t2 + R t2it2 | + Bi e" ri * . (5.14) 

By Lemma 15.81 there exist gaussian random variables Zq , Z\ , Z2 , ■ ■ ■ on the same probability space 
with E{Zt} = and E,{ZfZ s } = Rt jS (in fact in proof of Theorem 14.21 we show that {^i}r>i>o is the 
weak limit of the empirical distribution of {/i 1+1 }t>«>o)- Then (assuming, without loss of generality, 
ti >t\) we have 

|R*i,ti - 2Rii,i 2 + Rt 2 ,t 2 \ = E {(^i - z t 2 ) 2 } 

i,j=ti 
t 2 -l 

< {^{(Z^-Zrf} 1 ' 



1/2 2 



i=ti 

00 



2 



<4B 1 [^e~ rii / 2 

4Bi . f 

< - e 1 1 

-(l_ e -n/2)2 

which, together with Eq. (|5.14p proves our claim. □ 

We are now in position to prove Lemma 13.61 

Proof of Lemma \3.6l We will show that, under the assumptions of the Lemma, limyv-s.oo \ St 2 (j) \ 
Sti(l)\/N < £ almost surely, which implies our claim. Indeed, by Theorem 14.21 we have 



1 1 N 

i=l 



i=l 

1 N 

~ N^L ~N '^^{\x t 2- 1 +A*z t 2- 1 -x t 2\>(l-y)e t2 . 1 , |x'i- 1 + J 4*^'i- 1 -x t i|<(l- 7 )6» t2 _i} 
i=l 

= F{\X + Z t2 ^\ > (1 - 7)^-1, \X Q + Z tl ^\ < (1 - 7)^-1} = P tl ,t 2 , 



25 



where (Z tl ,Z t2 ) are jointly normal with E{Z£} = Rt^, ^{Z^Z^} = Rt u t 2 , ^{ z t 2 } = ^t 2 ,t 2 - (Notice 
that, although the function !{•••} is discontinuous, the random vector {X$ + Z^-x^Xq + Zf 2 -i) 
admits a density and hence Theorem 14.21 applies by weak convergence of the empirical distribution 
of {{x l r l + (A*^- 1 ), , xf- 1 + (A* z ta ~ x )i)}i<i< N .) 

Let a = (1 — 7)ar*. By Proposition 11.21 for any e > and all t* large enough we have |(1 — 
7)6» ti _i - a\ < e for i G {1, 2}. Then 

Ptuta <P{|X + Z t2 _i| >a-e, \X Q + Z tl -i\ <a + e} 

< P{|Z tl _i - Z t2 _!| > 2e} + P{a - 3e < |X + Z tl ^\ < a + e} 



tl-l,*2-l 



+ Rt 2 -l,t2-l] + 



4c 



V2¥rV 



+ -, 



where the last inequality follows by Lemma 15.91 By taking e = e r2 **/ 3 we finally get (for some 
constant C) Pt lt t 2 < C e _r2 **, which implies our claim. □ 
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A Properties of the state evolution recursion 
A.l Proof of Proposition 11.21 

It is a straightforward calculus exercise to compute the partial derivatives 



<9F n 1 r /X o -0\ /-X)-0m 1 (X 



dF, 2 



S 

29 



1 2t r 



x n -e 



x n -e 



T 



-x n -e 



+ 



}. 

(A.l) 



-Xn-e 



d72 (T ' aT) 



, Xo — ar . 



From these formulae we obtain the total derivative 

(l + a 2 )E{ 
-E{( 

Differentiating once more 



X + ar\ L ( Xq — ar 



Xq — ar 

T 

Xq — ar 

T 



(A.3) 



-Xn — ar 



)}■ 



d 2 F 

2\2 



d(r 



it ,ar 



1 r/X \3 



Xq — 0£T\ 



-Xq — ar 



}■ 
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Now we have 

u 3 [4>{u - a) - (j)(-u - a)} > 0, (A.4) 

with the inequality being strict whenever a > 0, u ^ 0. It follows that r 2 i— y F(r 2 ,ar) is concave, 
and strictly concave provided a > and Xq is not identically 0. 
Prom Eq. (|X3|) we obtain 

dF 2 

lim — 7 (r 2 ,ar) = -{(1 + a 2 )$(-a) - aMa)} , (A.5) 
r 2 ^oo dr z o L J 

which is strictly positive for all a > 0. To see this, let /(a) = (1 + a 2 )&(— a) — a 0(a), and notice 
that /'(a) = 2a$(-a) - 20(a) < 0, and /(oo) = 0. 

Since r 2 h-> F(r 2 ,ar) is concave, and strictly increasing for r 2 large enough, it also follows that 
it is increasing everywhere. 

Notice that a t- > f(at) is strictly decreasing with /(0) = 1/2. Hence, for a > a m i n (S), we have 
F(r 2 ,ar) > r 2 for r 2 small enough and F(t 2 ,qt) < r 2 for r 2 large enough. Therefore the fixed 
point equation admits at least one solution. It follows from the concavity of r 2 i— > F(r 2 , olt) that the 
solution is unique and that the sequence of iterates r 2 converge tor*. □ 

A. 2 Proof of Proposition 11.31 

As a first step, we claim that a i— > T 2 (a) is continuously differentiable on (0,oo). Indeed this is 
defined as the unique solution of 

r, 2 = F(r 2 ,an). (A.6) 

Since (r 2 ,a) H> F(r 2 ,an) is continuously differentiable and < ^(t^ot*) < 1 (the second 
inequality being a consequence of concavity plus lim T 2_ ) . 00 ^(r 2 ,ar) < 1, both shown in the proof 
of Proposition II. 2[) . the claim follows from the implicit function theorem applied to the mapping 
(r 2 ,a)^[r 2 -F(r 2 ,a)]. 

Next notice that r 2 (a) — > +oo as a J. a m i n (6). Indeed, introducing the notation = lim r 2_ s . 00 4^( 
we have, again by concavity, 

rl > F(0,0) + F' oo r 2 , 

i-e. r 2 > F(0,0)/(1 - F^). Now F(0, 0) > a 2 , while F^ f 1 as a | a mm (5) (shown in the proof of 
Proposition ll.2|) . whence the claim follows. 

Finally r 2 (a) — > a 2 + E{Aq}/<5 as a — > oo. Indeed for any fixed r 2 > we have F(r 2 ,ar) — > 
a 2 + E{Xq}/(5 as a — > oo whence the claim follows by uniqueness of r*. 

Next consider the function (a, r 2 ) i— >■ g(a,r 2 ) defined by 

5 (a, r 2 ) = ar{l - ^ P{|X + r Z| >«r}}. 

Notice that \(a) = g(a,r* 2 (a)). Since g is continuously differentiable, it follows that a i— )■ A(a) is 
continuously differentiable as well. 
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Next consider a I a m - m , and let 1(a) = 1 — j P{|^"o + t* Z\ > otJ. Since r* — > +00 in this limit, 
we have 

h = lim 1(a) = 1 - I ¥{\Z\ > a min } = 1 - - $(-a min ) . 

Using the characterization of a m { n in Eq. (jl.lOp (and the well known inequality a&(—a) < 0(a) valid 
for all a > 0), it is immediate to show that < 0. Therefore 

lim A(q) = lim err* (a) = —00 . 

Finally let us consider the limit a — > 00. Since r*(a) remains bounded, we have hin^^oo P{|Xo + 
t* Z\ > ar»} = whence 

lim X(a) = lim arAa) = 00. 

□ 

A. 3 Proof of Corollary 11.51 

By Proposition II. 3\ it is sufficient to prove that, for any A > there exists a unique a > a m i n such 
that A(q) = A. Assume by contradiction that there are two distinct such values a±, ai- 

Notice that in this case, the function a(X) is not defined uniquely and we can apply Theorem 11.41 
to both choices a(X) = a± and a(A) = 02- Using the test function ip(x, y) = (x — y) 2 we deduce that 

lim l||x-x || 2 =M{[ti(X + t*Z; an) - X ] 2 } = 5(t 2 - a 2 ) . 

Since the left hand side does not depend on the choice of a, it follows that t*(q!i) = ^(o^)- 
Next apply Theorem 11.41 to the function ip(x,y) = \x\. We get 

lim — = E{\i](X +nZ; ar*)\\ . 

iV->oo iV 

For fixed r*, 9 h-» E{ \t](Xq + t*Z ; #)|} is strictly decreasing in 9. It follows that a\T*(a\) = a2T*(oi2). 
Since we already proved that r*(ai) = r^a^), we conclude a\ = ai- □ 

B Proof of Theorem 14.21 

First note that using representation (|4.2[) we have x t + A*z l = xq — h t+ . Furthermore, using Lemma 
IF.3f b) we have almost surely 

1 N 

lim^ /2^( x o,i ~ h l +1 > x o,i ~ hl +1 ,x ,i) = E^(X - Z s , X - Z u X ) } 

i=l 

= E^(X + Z s ,X + Z t ,X )} 
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for gaussian variables Z s , Z% that have zero mean and are independent of Xq. Define for all s > 
and t > 0, 

R M = lim (h t+ \ h s+1 ) = E{Z t Z s } . (B.l) 

N— too 

Therefore, all we need to show is that for all s, t > 0: R( jS and R( )S are equal. We prove this by 
induction on max(s,i). 

• For s = t = we have using Lemma lF.3l fb) almost surely 

R ,o = hm (h 1 ^ 1 ) = t 2 = a 2 + ^E{X 2 } , 

TV— >oo 

that is equal to Ro,o- 

• Induction hypothesis: Assume that for all s < k and t < k, 

R*,s = R*,s • (B-2) 

• Then we prove Eq. (|B.2p f° r t = k + 1 (case s = k + 1 is similar). First assume s = and 
t = k + 1 in which using Lemma lF.3f c) we have almost surely 

R fc+ io = lim {h k+2 ,h l ) = lim (m k+1 ,m°) 

' N— >oo n— loo 

= lim {b k+1 -w,b°-w)=a 2 + \ lim {q k+ \ q°) 

= o 2 + ±R[[r)(X -Z k \0 k )-X ][-X ]} , 

= a 2 + ^E{[r ] (X + Z k ;9 k )-X ][-X }} , 

where the last equality uses q° = —xq and Lemma IF.3f b) for the pseudo-Lipschitz function 
(h k+l ,x 0ti ) i ^ [r/(xo,i _ h k+1 ;0 k ) - x 0)i ][-x 0)i ]. Here X ~ px and Z k are independent and 
the latter is mean zero gaussian with E{Zf} = R fe)fe . But using the induction hypothesis, 
Rfc,fc = Rfc,fc holds. Hence, we can apply Eq. (|4.14p to obtain R^o = R*,o- 

Similarly, for the case t = k + 1 and s > 0, using Lemma lF.3f b) (c) we have almost surely 
R fc +i s = lim (h k+2 ,h s+1 ) = lim (m k+ \m s ) 

= lim {b k+1 -w,b s -w) =a 2 + \ lim {q k+ \ q s ) 
n-too o N-¥oo 

= a 2 + Iemxq + z k -e k ) - X ][ V (X + z s _ i; e a _i) - x ]} , 



for Xo ~ px independent of zero mean gaussian variables Z k and Z s ^\ that satisfy 

Rfc,s-i = E{ZjtZ s _i} , R fc fc = E{Z|} , R s _i )S _i = EjZ 2 ^} , 
using the induction hypothesis. Hence the result follows. 
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C Proof of Lemma 14.31 

The proof of Lemma 14.31 relies on Lemma 15.71 which we will prove in the first subsection. 
C.l Proof of Lemma 15.71 

Before proving Lemma 15.71 we state and prove the following property of gaussian random variables. 

Lemma C.l. Let Z\ and Z 2 be jointly gaussian random variables with E(Z 2 ) = = 1 and 

E(ZiZ 2 ) = c > 0. Let L be a measurable subset of the real line. Then E(Z\ £ /, Z 2 £ /) is an 
increasing function of c € [0, 1]. 

Proof. Let {X s } s€ ir be the standard Ornstein-Uhlenbeck process. Then (Zi,Z 2 ) is distributed as 
(Xo,X t ) for t satisfying c = e~ 2t . Hence 

F(Zx El, Z 2 eL) = E[f(X )f(X t )] , (C.l) 

for / the indicator function of /. Since the Ornstein-Uhlenbeck process is reversible with respect to 
the standard gaussian measure /iQ, we have 

E[f(X )f(X t )} =X> _M (^,/)£ G = E c ^'-f)L (C-2) 

with < Ao < Ai < . . . the eigenvalues of its generator, {ipe}e>o the corresponding eigenvectors and 
( • , • the scalar product in L 2 (/uq)- The thesis follows. □ 

We now pass to the proof of Lemma 15.71 



Proof of Lemma 5.7. It is convenient to change coordinates and define 

yt,i = Rt-i,t-i = r t 2 _! , 2/4,2 = Rt,t = r| , 2/4,3 = Rt-i,t-i - 2R t)t -i + Rt,t ■ (C.3) 

The vector y t = (2/4,1; 2/4,2; 2/4,3) belongs to R;j_ by Lemma f5T8l Using Eq. (|4.13p . it is immediate to 
see that this is updated according to the mapping 

yt+i = G(y t ) , 

Gi(jft) = 2/4,2, (C.4) 
G 2 (yt) = a 2 + ^E{[ V (X + Z t ;a^)-X } 2 }, (C.5) 

G 3 (vt) = ^E{[n(Xo + Z t] a^)-r ! (X + Z t ^;a^yU)} 2 }. (C.6) 

where (Zt,Zt-i) are jointly gaussian with zero mean and covariance determined by E{Z 2 } = 
E{Z^_ 1 } = y t; i, E{(Z t - Z t -i) 2 } = 2/4,3- This mapping is defined for y t;3 < 2(y t) i + 2/4,2)- 

Next we will show that by induction on t that the stronger inequality 2/^3 < (yt,i +2/4,2) holds for 
all t. We have indeed 

2 

2/t+i,i + 2/4+1,2 - 2/4+1,3 = 2cj 2 + - E{r/(Xo + Z t ; a^/yj^) rj(X + Zt-i; a^/yT^)} - 
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Since K{Z t Z t ^\} = (yt : i + yt,2 — 2/t,3)/2 and x t- > rj(x;6) is monotone, we deduce that yt^ < 
(yt,i + 2/4,2) implies that Z t , Z t -i are positively correlated. Therefore E{t](Xq + Z t ; Oi-Jyt^) T)(Xq + 
Z t -i; a^/yjj)} > 0, which in turn yields yt+1,3 < (yt+1,1 + Vt+ifi)- 
The initial condition implied by Eq. (I4.14j) is 

yi,i = a 2 + ^E{X 2 }, 

J/1,2 = <? 2 + \ + Z ; Oq) - X } 2 } , 

y 1 , 3 = ^E{7 ] (X + Z ;6 ) 2 }, 

It is easy to check that these satisfy 7/1,3 < 2/1,1+2/1,2- (This follows from E{Xo[Xo — r?(Xo + Zo; #0)]} > 
because xo ^ xo — E,zv( x o + Zo'i@o) is monotone increasing.) We can hereafter therefore assume 

yt,3 < yt,i + yt,2 for all t. 

We will consider the above iteration for arbitrary initialization y (satisfying y ,3 < 2/o,i + 2/0,2) 
and will show the following three facts: 

Fact (i). As t — > 00, yt,i,yt,2 T *- Further the convergence is monotone. 

Fact (ii). If y ,i = 2/0,2 = tI and 2/0,3 < 2rf , then y t ,i = y t , 2 = t 2 for all t and y t>3 -+ 0. 

Fact (in). The jacobian J = Jg(2/*) of G at = (t*,t* ,0) has spectral radius <x(J) < 1. 

By simple compactness arguments, Facts (i) and (ii) imply y t — > as t — > 00. (Notice that yt,3 
remains bounded since yt,z < (y^i + 2/4,2) and by the convergence of yt,i,yt,2-) Fact (Hi) implies that 
convergence is exponentially fast. 

Proof of Fact (i). Notice that yt, 2 evolves independently by yt+1,2 = ^(yt) = F(j/2,t, a^/2/2,t) 5 
with F( • , •) the state evolution mapping introduced in Eq. (jl.6p . It follows from Proposition 11.21 
that yt t 2 —> t 2 monotonically for any initial condition. Since yt+1,1 = yt,2, the same happens for y%^\. 

Proof of Fact (ii). Consider the function G*(x) = Gz(t 2 ,t 2 ,x). This is defined for x € [0,4r 2 ] 
but since yt,3 < yt,i +yt,2 we will only consider G* : [0, 2r*] — > M.+ . Obviously G*(0) = 0. Further G* 
can be represented as follows in terms of the independent random variables Z, W ~ N(0, 1): 

G*(x) = Ie{[ V (X + yV, 2 - x/AZ + (Vx~/2)W; an) - 7](X + y 'r 2 - x/AZ - (yfa/2)W; «n)] 2 }(.C7) 
o 

A straightforward calculation yields 

GUx) = ^E{r/(X + Z t ; an) V '(X + Z t _ i; or,)} = ^P{|X + Z t | > on, |X + Z t -i\ > an} , 


where 2^-1 = y^ - x 2 /AZ + (x/2)W, Z t = yV| - x 2 /4Z - (x/2)W. In particular, by LemmaEH 
x i—)- G*(x) is strictly increasing (notice that the covariance of Z t -i and Z t is r 2 — (x/2) which is 
decreasing in x). Further 

G;(0) = ^E{ V '(X +nZ;an)}. 
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Hence, since A > using Eq. (jl.lip we have G'(0) < 1. Finally, by Lemma IC.ll x i— > G'(x) is 
decreasing in [0,2-r*). It follows that yt t 3 < G'(0)'yo,3 — > as claimed. 

Proof of Fact (in). From the definition of G, we have the following expression for the Jacobian 

1 

J G (y*) = | F'(t2) 
a G',(0) b 



where with an abuse of notation we let F'(r^) = ^5-F(r 2 ,ar) 
the above matrix, we get 



. Computing the eigenvalues of 

r 2 — rr-1 



a(J) =max{F / (r* 2 ), Gj,(0)}. 
Since G*(0) < 1 as proved above, and F(t*) < 1 as per Proposition II. 2\ the claim follows. □ 



C.2 Lemma 15.71 implies Lemma [4TTT1 

Using representations (14, 4h and (|4.3p (i.e., b l = w — z t and q l = xq — x) and Lemma IF. 3( c) we 
obtain, 



l im L\\ z t+i- z tf 2 = lim ±|| 6 m_ 6 t||2 



n— >oo 71 n— >oo /j 

ll„t+l „t| 

12 



Jim lll^ 1 -- 

= - lim -^\\x t+1 - ^Wl , 

5 N^oo N 

where the last equality uses q l = x t — xq. Therefore, it is sufficient to prove the thesis for — x l ||2- 
By state evolution, Theorem 14.21 we have 

lim h\x t+1 - x% = E{[r)(X + ZfA) - r,(X + Z t _i; t -i)] 2 } 

iV-s-oo iV 

< 2(d t - e t ^) 2 + 2E{(Z t - Z t ^) 2 } = 2{9 t - e t ^f + 2(R M - 2R M _x + R t _i, t _i) 

The first term vanishes as t — > oo because #j = art —> ar* by Proposition 11.21 The second term 
instead vanishes since Rj 5 t — > t*, Rt,t-i — » by Lemma ISTF! 



D Proof of Lemma 15.21 

First note that the upper bound on X max (R/N) is trivial since using representations (|4.7p . (|4.8p . 

= ft(h ,Xo), m = gt(b ,w) and Lemma IF. 3( c) (d) all entries of the matrix R/N are bounded 
as N — > oo and the matrix has fixed dimensions. Hence, we only focus on the lower-bound for 

The result for R = M*M and R = Q*Q follows directly from Lemma lF.3( g) and Lemma 8 of 
[BM10] . 

For R = Y*Y and R = X*X the proof is by induction on t. 
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For t = 1 we have Y t = b° and X t = h 1 + £,oQ° = h 1 — xq. Using Lemma TF.3l fb) (c) we obtain 
almost surely 

lim = 6 lim (6°, b°) = lim (g°, q°) = E{X 2 } , 

lim = lim {h l - x°, h 1 - x°) = E{(r Z + X ) 2 } = a 2 + ^±±E{X 2 } , 

where both are positive by the assumption P{Xo ^ 0} > 0. 

Induction hypothesis: Assume that for all t < k there exist positive constants cx(t) and cy(i) 
such that as N — ¥ oo 

cy(i)<A min (^), (D.l) 

cx(0<A min (^^). (D.2) 

Now we prove Eq. (lLUi) for i = fe + 1 (proof of (jLT2j) is similar). We will prove that there is a 
positive constant c such that as iV — > oo, for any vector aj G R': 

(Y t at,Y t at) > c\\a t \\l . 

First write at = (oi, . . . , aj) and denote its first i — 1 coordinates with at_i. Next, we consider 
the conditional distribution ^4|et_i • Using Eqs. (|4.9p and (|4.10p we obtain (since Y t = AQ t ) 

Y t at\e t -i = 4|6 t _i(<2t-lOt_i + ai g* _1 ) 

= E t _i(Q t _i a t _i + atq 1 ' 1 ) + atP^ t -i^± 1 ■ 
Hence, conditional on &t-l we have, almost surely 

lim {Y t a t , Y t at) = lim -J- ||y t _i a t _i + atE^q 1 ' 1 \\ 2 + a? lim (g*" 1 ,^ 1 ) . (D.3) 

Here we used the fact that A is a random matrix with i.i.d. N(0, 1/n) entries independent of 
(cf. Lemma |F.4|) which implies that almost surely 

- lim iV ^oo(i :, A J 7 t _ 1 igl~ 1 , J PM t -i^l" 1 > = lim 7V ^oo(gl~ 1 ,gl~ 1 ), 

- lim 7V ^ 00 (P A - 1 7 t _ i i^- 1 ,yt_i Et-i + atb 1 " 1 + atX^m 1 ' 2 } = 0. 

From Lemma lF.3f g) we know that lirn/v-^oo^j^ 1 ) q 1 ^ 1 ) is larger than a positive constant q. 
Hence, from representation (|D.3p and induction hypothesis (jD.ll) 



lim (Y t a t ,Y t at) > lim 



I 1 2 

°*l lltf-1 i \ ™t-2 



y/cy(t-l)\\at-i\\ - 77= II 6 '" 1 + X t~l m 



To simplify the notation let = lim jy^oo ^ 1 / 2 1| 6* 1 +\ t -im t 2 ||. Now if Cj|a t | < \j cy{t — 1 ) 1 1 a t _ 
then 

lim (Y t a t) Y t a t ) > ° y(t ~ 1} ||a f _i|| 2 + afa > min ( CY ^~ 1} , ft ) \\a t \\ 2 2 , (D.4) 
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which proves the result. Otherwise, we obtain the inequality 

lim (Y t a t ,Y t a t ) > a\q t > ( » 2 ^* 77- tt ) 1 1 1 1 1 , 
that completes the induction argument. 



E A concentration estimate 

The following proposition follows from standard concentration-of-measure arguments. 

Proposition E.l. Let V C M. m a uniformly random linear space of dimension d. For A G (0, 1), Ze£ 

P\ denote the orthogonal projector on the first mX coordinates ofM. m . Define Z{X) = sup{||P\u|| : 
v EV, \\v\\ = 1}. Then, for any e > there exists c(e) > such that, for all m large enough (and d 
fixed) 

F{\Z{k) -V\\ > e} < e" mc ( £ ). (E.l) 

Proof. Let Q G M mxd be a uniformly random orthogonal matrix. Its image is a uniformly random 
subspace of W" 1 whence the following equivalent characterization of Z(X) is obtained 

Z(X)=sup{\\P x Qu\\ : u£S d } 

where S d = {x G M. d : \\x\\ = 1} is the (i-dimensional sphere, and = denotes equality in distribution. 

Let Nd(s/2) be a (e/2)-net in Sd, i-e. a subset of vectors {it 1 , . . . ,u M } G 5 d such that, for any 
u G S d , there exists i G {1, . . . , M} such that ||u — u l || < e/2. It follows from a standard counting 
argument |Led01j that there exists an (e/2)-net of size |A^(e/2)| = M < (100/e) d . Define 

Z £/2 {\) = sup{||P A Qn|| : u G N d {e/2)} . 

Since u 1— > P\Qu is Lipschitz with modulus 1, we have 

P{|Z(k) - V\\ > e} < ¥{\Z e/2 { K ) -V\\> e/2} 

M 

<^P{|||P A Qu l || - V\\ >e/2}. 

i=l 

But for each i, Qu l is a uniformly random vector with norm 1 in M m . By concentration of measure 
in S m [LedOlj . there exists a function c(e) > such that, for x G S m uniformly random 

P{|||Paz|| - v 7 !! > e/2} < e ~ mc ( £ ) . 

Therefore we get 

P{|Z(k) - V\\ > e} < \N d (e/2)\e- rnc ^ < ^ mc(e) 
which is smaller than e _mc ( £ )/ 2 f or a ll m large enough. □ 
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F Useful reference material 



In this appendix we collect a few known results that are used several times in our proof. We also 
provide some pointers to the literature. 

F.l Equivalence of i 2 and i 1 norm on random vector spaces 

In our proof we make use of the following well-known result of Kashin in the theory of diameters of 
smooth functions jKas77j . Let L n ^ v = {x G R n |xj = , Vi> n(l — v) + 1}. 

Theorem F.l (Kashin 1977). For any positive number v there exist a universal constant c v such that 
for any n > 1, with probability at least 1 — 2~ n , for a uniformly random subspace V UjV of dimension 
n(l — v), 

Vi£ V nv : C;||x||2 < -t=||x||i . 

Vn 



F.2 Singular values of random matrices 

We will repeatedly make use of limit behavior of extreme singular values of random matrices. A very 
general result was proved in jBY93j (see also [BS09] ). 

Theorem F.2 ( |BY93j ). Let A G R nxN be a matrix with i.i.d. entries such that E{Aij} = 0, 
K{Afj} = l/n, and n = M5. Let a max (A) be the largest singular value of A, and a m i n (A) be its 
smallest non-zero singular value. Then 

Jim a max (A) ^ -L + 1 , (F.l) 

lim a min (A) =■ 4r " 1 ■ ( F - 2 ) 
We will also use the following fact that follows from the standard singular value decomposition 
min{||AE|| 2 : x G ker(A) ± , \\x\\ = l] = a mill (A) . (F.3) 



F.3 Two Lemmas from |BM10j 



Our proof uses the results of [BM10J. We state copy here the crucial technical lemma in that paper. 
Notations refer to the general algorithm in Eq. (|4.ip . General state evolution defines quantities 
W}t>o and {of }t>o via 

rf = E{g t (a t Z, W) 2 } , a 2 t = i E{/ t (rt_iZ, X ) 2 } , (FA) 
where W ~ pw and X ~ px are independent of Z ~ N(0, 1) 

Lemma F.3. Let {<7o(A0}tv>o an d {A(N)} n> be, respectively, a sequence of initial conditions 
and a sequence of matrices A G W nxN indexed by N with i.i.d. entries Aij ~ N(0, l/n). Assume 
n/N — > S G (0,oo). Consider sequences of vectors {xq(N),w(N)}n>o, whose empirical distributions 
converge weakly to probability measures px and p\y on R with bounded (2k — 2) th moment, and 
assume: 
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(i) lim^ooE^^^-^) = E PXq (X^) < co. 

(ii) mn N ^ O0 E fiwW (W 2k - 2 ) = E pw (W 2k ~ 2 ) < oo. 

(Hi) limTv^ooE^^^- 2 ) < oo. 

Let {at, Tt}t>o be defined uniquely by the recursion (IF. 4ft with initialization <Tq = 5 _1 \im n ^ oo (q , g°) 
Then the following hold for allicNU {0} 

(a) 

t-l 

h t+1 \ 6t+ht = ^aih i+1 + A*mi + Q t+ id t+1 (l) , (F.5) 

i=0 

t-i 



i=0 



where A is an independent copy of A and the matrix Qt (Mt) is such that its columns form an 
orthogonal basis for the column space of Qt (Mt) and Q*Qt = Nltxt (M*Mt = nltxt)- 



(b) For all pseudo-Lipschitz functions 4>h,<ftb '■ R i+2 — > R of order k 

N 



lim 1 V^ 1 ,...,^ 1 ,^,*) =-E{cf> h (ToZ ,...,TtZ t ,X )}, (F.7) 
i=l 
1 " 

lim - V) &(&?,..., &f, Wi ) a =E{4(a Z ,...,^,VF)} ; (F.8) 

i=l 

where (Zq, . . . , Z t ) and (Zo, . . . , Z t ) are two zero-mean gaussian vectors independent of Xq, W, 
with Z i} Zi ~ N(0,1). 

(c) For all < r,s < t the following equations hold and all limits exist, are bounded and have 
degenerate distribution (i.e. they are constant random variables): 

lim (h r+1 , h s+1 ) =■ lim (m r ,m s ) , (F.9) 

N~>oo n— >oo 

lim {b r , b s ) a = i lim (g r , g s > . (F.10) 

n->oo N->oo 

(d) For all < r, s < t, and /or any Lipschitz function if : R 2 — > R , i/ie following equations 
hold and all limits exist, are bounded and have degenerate distribution (i.e. they are constant 
random variables): 



lim (h r+1 ,<p(h s+1 ,x )) a = lim (/i r+1 ,/i s+1 )(^(/i s+1 ,x )), (F.ll) 

N—too N—too 

it us\ i ,J ti,s 



lim (& r ,(^(6 s ,n;)) = lim (6 r , 6 s ) (^(6 S , w)> . (F.12) 
Here if' denotes derivative with respect to the first coordinate of (p. 
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(e) For i = k — 1, the following hold almost surely 

N 



(/) For allO<r < t: 



lim sup ^E(^ +1 ) 2 '<oo, (F.13) 

1 " 

lim sup - ^2{b\) 2i < oo. (F.14) 



n— >oo Tl . 

1=1 



lim l(h r+1 ,q )^0. (F.15) 

iV->-oo iV 



(g) For all < r < t and < s < t — 1 i/ie following limits exist, and there exist strictly positive 
constants p r and s s (independent of N , n) such that almost surely 



lim (q r ± ,q r ± )>p r , (F.16) 
N— >oc 

lim (mi, ml) > ? s . (F.17) 

It is also useful to recall some simple properties of gaussian random matrices. 

Lemma F.4. For any deterministic u € H. and v G W 1 with \\u\\ = \\v\\ = 1 and a gaussian matrix 
A distributed as A we have 

(a) v*Au = Z/yjn where Z ~ N(0, 1). 

(b) limn-j.oo ||^4ti|| 2 = 1 almost surely. 

(c) Consider, for d < n, a d-dimensional subspace W of W 1 , an orthogonal basis wi, ... ,11)4 of 
W with || 2 = n for i = l,...,d, and the orthogonal projection Pw onto W. Then for 

D = [w±\ . . . \wd], we have PyyAu = Dx with x G R rf that satisfies: lirm^oo ||x|| =' (the limit 
being taken with d fixed). Note that x is 0^(1) as well. 
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